Skip to content

Conversation

masseyke
Copy link
Member

@masseyke masseyke commented Sep 16, 2024

This is the first of a handful of PRs that will result in the addition of component template substitutions support to the simulate ingest API. See the draft PR at #112762 for the full idea.
This PR adds two methods to the BulkRequest API that will be used by the component template substitution code -- getComponentTemplateSubstitutions() and shallowClone(). Both will be used in a future update to TransportAbstractBulkAction.
The next PR will add the transport changes to take advantage of these changes, and the final one will add the full documentation and rest-level testing (this PR updates RestSimulateIngestAction to pass the new SimulateBulkRequest constructor arg in order to compile, but there is no actual additional functionality at the rest layer).

@masseyke masseyke added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.16.0 v9.0.0 labels Sep 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke masseyke marked this pull request as ready for review September 16, 2024 19:28
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Sep 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Comment on lines 492 to 497
public BulkRequest shallowClone() {
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.setRefreshPolicy(getRefreshPolicy());
bulkRequest.waitForActiveShards(waitForActiveShards());
bulkRequest.timeout(timeout());
return bulkRequest;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BulkRequest has many other things than only the refresh policy, active shards, and timeout. Are we going to copy the rest of them also? Should the javadoc reflect what's being copied?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went his way because this is what BulkRequestModifier did (this code was moved out of there). I wavered on whether I ought to add the other fields or not, and wound up just repeating and testing the previous behavior. Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just documenting it in the javadoc is fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up adding the rest of the fields just in case (I don't think they'd intentionally been left out -- I think they were just added after the BulkRequestModifier method had been written).

Comment on lines 154 to 155
Map<String, AliasMetadata> aliases = null;
DataStreamLifecycle lifecycle = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to cause problems with it having unconsumed xcontent? Should we be throwing an exception for all other keys other than mappings and settings to show the user that those cannot be simulated? Or should we keep the leniency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if we're ignoring eventual data_stream contents, then eventually from #98877 which wants to:

Figure out how to specify tsdb settings in component templates. For example index.routing_path can be specified in a composable index template if data stream template' index_mode is set to time_series.

Then we may need the whole thing for validation? This may be too premature, I'm just curious right now. Would it make sense to just load the whole thing into a Template from the JSON?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess there's no harm in just using it all. Plus it simplifies my code. At this point though, I'm working with a Map -- think I ought to just pass it through all the way as xcontent? It's not really performance-sensitive since this is simulate code. And it's convenient for the sake of testing to have these overrides as a Map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit is that if ComponentTemplate.parse ever changes to do some kind of validation, this would pick that up at least

try (var parser = XContentHelper.mapToXContentParser(XContentParserConfiguration.EMPTY, rawTemplate)) {
componentTemplate = ComponentTemplate.parse(parser);
} catch (IOException e) {
throw new AssertionError(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to throw an assertion error here, as that would kill the ES node, we can probably throw an exception that will fail the simulation though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This was not supposed to be an AssertionError!

@masseyke masseyke requested a review from dakrone September 17, 2024 16:09
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@masseyke masseyke merged commit 74c485a into elastic:main Sep 17, 2024
15 checks passed
@masseyke masseyke deleted the add-new-BulkRequest-methods-for-simulate branch September 17, 2024 18:19
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 112957

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue Team:Data Management Meta label for data/management team v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants